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The concept of fc-FWER has received much attention lately as an 
appropriate error rate for multiple testing when one seeks to control 
at least k false rejections, for some fixed fc > 1. A less conservative 
notion, the fc-FDR, has been introduced very recently by Sarkar [j4nn. 
Statist. 34 (2006) 394-415], generalizing the false discovery rate of 
Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289- 
300]. In this article, we bring newer insight to the fc-FDR considering 
a mixture model involving independent p-values before motivating 
the developments of some new procedures that control it. We prove 
the fc-FDR control of the proposed methods under a slightly weaker 
condition than in the mixture model. We provide numerical evidence 
of the proposed methods' superior power performance over some k- 
FWER and fc-FDR methods. Finally, we apply our methods to a real 
data set. 

1. Introduction. The classical idea of controlling at least one false dis- 
covery has been generalized recently to that of controlling at least k false 
discoveries, for some fixed k>l. The rationale behind it has been that often 
in practice one is willing to tolerate a few false rejections, so by controlling 
k or more false rejections the ability of a procedure to detect more false 
null hypotheses can potentially be improved. The /c-FWER, the probability 
of at least k false rejections, is one such generalized error rate that has re- 
ceived considerable attention [5, 6, 12, 13, 15, 16, 19, 20, 27]. With Vn and 
Rn denoting, respectively, the total number of false rejections and the total 
number of rejections of null hypotheses in testing n null hypotheses, it is 
defined as 

(1.1) A:-FWER = Pr{K > k], 
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generalizing the traditional familywise error rate (FWER). Sarkar [19] has 
introduced the following alternative error rate generalizing the usual false 
discovery rate (FDR) of Benjamini and Hochberg [1]: 



with I{A) denoting the indicator of the event A and Rn V 1 = max(i?„, 1). 
It is the expected ratio of k or more false rejections to all rejections of null 
hypotheses, and, as /c-FDR < /c-FWER, controlling it is a less conservative 
approach than controlling the A;-FWER. 

Given p- values corresponding to the null hypotheses, Sarkar [19] provided 
a stepup fe-FDR procedure utilizing the kth. order joint null distributions 
of the p-values. It was assumed that these p-values are either independent 
or positively dependent in a sense slightly stronger than assumed for k = 1 
in proving the FDR control of the Benjamini-Hochberg (BH) procedure 
[3, 17]. Later, Sarkar and Guo [22] have given stepup as well as stepdown 
procedures based on the bivariate null distributions of the values, assuming 
the p-values are independent or positively dependent in the same sense as 
when k = l. 

Alternative /c-FDR procedures with independent p- values are constructed 
in this article taking the approach of conservatively estimating the FDR for 
a fixed rejection region and using these estimates to produce FDR control- 
ling procedures, as in [23, 24, 28]. For a single-step test with a nonrandom 
threshold, we derive a formula for the fc-FDR of the test under the mixture 
model considered in [23] and many other subsequent papers. The formula 
offers a new insight into the notion of A;-FDR in relation to that of the FDR. 
It provides a simple and intuitive upper bound to the A;-FDR that can be 
thought of as a scaled version of the FDR, with the (A;— 1)-FWER in testing 
n — 1 null hypotheses being the scale factor. Motivated by this, we consider 
conservative point estimates of the product of FDR and the probability of at 
least k — 1 false rejections while testing n — 1 null hypotheses, given a fixed 
rejection region for each null hypothesis. Then we develop through these 
estimates procedures (stepwise) that control the A:-FDR at a given level. 

One of the new /c-FDR procedures developed is a generalized version of 
the BH FDR procedure. Procedure 1. Others are improved versions of Pro- 
cedure 1 using a class of estimates of the number of true null hypotheses. 
The A;-FDR control of these procedures is proved assuming that the p- values 
are independent with each having the C/(0, 1) distribution when the corre- 
sponding null hypothesis is true. This is a slightly weaker assumption than 
the i.i.d. mixture model. 

The performances of our procedures are numerically compared with other 
relevant procedures. It is important to point out that while we are in the 



(1.2) 



fc-FDR = S(/c-FDP) 



where A:-FDP 



VnljVn > k) 
RnVl 



GENERALIZED FDR 



3 



paradigm of controlling k false rejections, /c-FWER and other /c-FDR proce- 
dures should be the relevant competitors. With that in mind, we numerically 
compare Procedure 1 with two fc-FWER procedures in Sarkar [20] to see the 
extent of power improvement we have in a A;-FDR procedure over a A:-FWER 
procedure. This improvement is seen to be quite significant, especially for 
large number of hypotheses. 

Considering an ideal situation where the number of true null hypotheses 
is given to us by an oracle, we determine the oracle procedure. It is a stepup 
procedure that mimics Procedure 1 with the number of true null hypotheses 
assumed known. We numerically compare the powers of different fc-FDR 
procedures, those proposed here. Procedure 1 and its modification with a 
particular choice of the estimate of true null hypotheses, and the one in 
Sarkar [19] , relative to the power of the oracle procedure. 

Although this paper is motivated by the work of [23, 24], we have not fully 
pursued their line of research here. We keep our focus mainly on developing 
procedures controlling the fc-FDR and not on estimating it. Furthermore, 
we have not taken the route of generalizing Storey's concept of positive FDR 
and the related g- value method. Finally, we obtain our results only in the 
finite sample setting. 

The layout of the paper is as follows. The A:-FDR formula under the 
mixture model is given in Section 2. Having briefly introduced in Section 3 
a class of conservatively biased point estimates of the fc-FDR based on this 
formula, we motivate our procedures controlling the /c-FDR in Section 4. 
The findings of numerical studies are presented in Section 5. An application 
to a real data set is provided in Section 6. Some concluding remarks and 
additional numerical investigations are made in Section 7. Proofs of all the 
main results are given in the Appendix. 

2. The fc-FDR under mixture model. Given n null hypotheses f^i, i^^, 
consider testing if Hi = (true) or Hi = 1 (false) simultaneously for i = 
l,...,n, based on their respective p- values pi,...,pn. We first consider a 
single step multiple testing procedure rejecting each Hi = ii pi < t for 
some fixed, nonrandom t G (0, 1) and derive a formula for the fc-FDR of this 
procedure under the following model considered in [23] . 

Mixture model: Let {pi,Hi), i = 1, . . . , n, be i.i.d. as {p, H), where 

Pr(p < u\H) = (1 - H)u + HFi{u), u G (0, 1), 

(2.1) 

Pr(if = 0) = vro = 1 - Pr(if = 1), 
for some cdf F\{u). 

Theorem 2.1. Lei 

n n 

(2.2) Vn{t) = Y,HPi<t,Hi = 0), Rn{t) = Y,I{p^<t). 

i=l i=l 
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Then, under the above mixture model, the k-FDR of the single-step test 
rejecting each Hi = if pi <t is given by 

,2.3) ..FDR,«)=«..«^{M|l_i}. 

The formula in Theorem 2.1 provides an insight into the fc-FDR as a mea- 
sure of generahzed FDR as weh as a direction toward developing procedures 
that control it. To see this, consider first k = 1 and notice that for the FDR 
the formula is given by 

FBRn{t) = rnrotE ^ 

(2.4) 



■■E[Vnit)]E 



Rn-l{t) + l 
1 



Rn-lit) + lj' 



Of course, since i?„,_i ^ Bin[n — 1, F(t)], with F{t) = -K^t + (1 — 7ro)Fi(t), we 
have 

1 \ l-[l-F{t)r ^ V,{Rr,{t)>l} ^ 

\Rn-i{t) + l\ nF{t) nF{t) 

that is, the formula (2.4) is same as the following alternative formula 
FDR„(0 = ^Pr{i?„(t)>l} 

(2.5) 

= 4[MIpr{i2„(t)>l}, 

given in Storey [23] and commonly used in many subsequent papers. Never- 
theless, (2.4) offers a slightly different insight into the FDR than (2.5), and 
it is this insight that helps us in understanding what the fc-FDR means as 
a generalization of the FDR. Writing the /e-FDR as 

k-¥DRn{t) 

(2.6) 

= Pr{K-i(i) >k- llnvrot^l- ]—— Vn-i{t) >k-l\, 

we see that it is a combination of the {k — 1)-FWER in testing n — 1 null 
hypotheses and an FDR-type measure conditional on at least k — 1 false 
rejections in testing n — 1 null hypotheses. 

For a fixed t, I[Vn-i{t) > k — 1] is a stochastically increasing function 
of Rn~i{t), because Vn~i{t) is so; whereas, [Rn-i{t) -\- is a decreasing 
function of i?„_i(t). Using these, we get 
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(see Appendix A. 4 for a proof). In other words, we have 

(2.8) /c-FDR„(t) < Pr{K-i(i) >k- l}FDR„(t). 

Storey [23] estimated FDR„(t), given a fixed rejection region (0,t) for 
each null hypothesis, by using conservative point estimates of the quantity 
'KQt/F{t). Borrowing Storey's idea, we consider conservatively estimating the 
quantity 

(2.9) -^Pr{K-i(t)>fc-l} 

for estimating the fc-FDR toward developing procedures that control it. Be- 
fore we do that in the next section, it would be interesting to see what the 
quantity (2.9) means and how it is related to the original definition of the 
A;-FDR(t). 

Since Vn~i ~ Bin(n — l,7rot), the probability Pr{y„_i(t) > — 1} is equal 
to G{k — 1, n — 1, TTot), where 

(2.10) G{k,n,u) = Y,{^-^'^^{'^-uT~^. < n < 1. 

j=k ^ ^ 

Also, E{Vn{t)I{yn{t)>k)} = n'KQtG{k-l, 71-1,71^1) and E{Rn{t)I{Rn{t) > 
1)} = nF{t). Thus, we see that 

that is, the quantity (2.9) is the ratio of the expectations, conditional on 
at least one rejection, of the numerator and denominator terms in /c-FDP, 
the expectation of which is the /c-FDR. This is similar to what Storey [23] 
noted when k = 1, that is, for the ratio t^qI / F{t). Storey also showed that 
'KQt/F{t) is the positive false discovery rate (pFDR) defined in [23] under the 
mixture model. Thus, the quantity (2.9) is also seen to be a combination of 
the {k — 1)-FWER in testing n — 1 null hypotheses and the pFDR. The right- 
hand side ratio in (2.11) when k = 1 has been referred to as the marginal 
FDR (mFDR) in [26] where an optimal procedure controlling the mFDR 
is developed under the model (2.1), taking a compound-decision theoretic 
approach to multiple testing. 

3. Conservative point estimates of the fc-FDR(t). Storey [23] proposes 
the following class of conservative point estimates of FDR(t): 



(3.1) 
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where 

n - Rn{\) 



(3.2) ^o(A) 



n(l-A) 



for any A € [0,1). Multiplying this with G{k — l,n — l,t), a conservative 
version of Pr{Vn-i{t) > k — 1}, we consider estimating the A;-FDR(t) as 
follows: 

(3.3) M?D-R.W = ""°<^"g^^-;;'-^-" . A. [0.1). 



Theorem 3.1. Let the p-values be independent and those correspond- 
ing to the true null hypotheses be i.i.d. ?7(0, 1). Then, E{k-YI}^\{t)) > 
A;-FDR(t), for every fixed A G [0, 1). 

This result follows from [23, 24]. It shows that the point estimates given 
by (3.3) for the /c-FDR are conservative. 

Remark 3.1. A more natural way of estimating the /c-FDR(t) would 
be to estimate Y'v{Vn-i{t) >k — 1} using G{k — l,n — l,7ro(A)i), instead of 
G{k — 1,77, — l,t), and multiply this with (3.1). However, for such estimates. 
Theorem 3.1 holds under certain restrictions on A depending on t. 

4. Procedures controlling the fc-FDR. Using /c-FDRA(t), we will now 
derive a new class of /c-FDR procedures. Let 

(4.1) ta{k-VDRx) = sup{0 < t < 1 : /^TDRA(t) < a}. 

Then, reject Hi if pi < ta{k-FDRx). Given pi-n < ••• < Pn-.n, tbe sorted 
p-values, this procedure when A = (i.e., ttq = 1) is equivalent to the fol- 
lowing procedure. 

Procedure 1 . Reject , . . . , H^^^ , where 

(4.2) I = max< 1 < 7 < n:pi-nG(k — l,n— l,Prn) < — f, 

I n J 

if the maximum exists, otherwise, reject none, where is the null hypoth- 
esis corresponding to pi:n, i = I, ■ ■ ■ ,n. 

Theorem 4.1. Procedure 1 controls the k-FDR at a if the p-values are 
independent and those corresponding to the true null hypotheses are i.i.d. 
[/(0,1). 
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Define 

(4.3) Gk,n{t)=tG{k-l,n-l,t), te{0,l). 

Let G^\ be the inverse function of Gk^n- Then, Procedure 1 is a stepup 
procedure with the critical values Oi = G~j^\{ia/n), i = 1, . . . ,n, generalizing 
the BH procedure from an FDR to a /e-FDR procedure. As Gk^n{t) < t and 
Gk,n{t) is increasing in t (see, e.g.. Result A.l in Appendix A. 2), G^^(i) > 
t. In other words. Procedure 1 is uniformly more powerful than the BH 
procedure. 

It is important to note that, as in a A;-FWER procedure, the first k — 1 
critical values in Procedure 1 and the one to be developed later can be cho- 
sen arbitrarily. This is because the first k — 1 critical values in any stepwise 
procedure have no role in defining the /c-FDR of such a procedure as the 
/c-FDR is zero until at least k of the null hypotheses are rejected. Neverthe- 
less, the best way to choose these critical values would be to keep them all 
constant at the kih critical value; see, for example, [19]. So, we consider the 
first k — 1 critical values in our proposed /c-FDR methods to be same as the 
kth one while comparing them with fc-FWER and other /c-FDR procedures. 

Does Procedure 1 (with its first k — 1 critical values same as the kth 
one) provide a more powerful method of controlling k false rejections than 
a compatible /c-FWER method? Lehmann and Romano [13] gave a step- 
down /c-FWER procedure generalizing Holm's original EWER procedure 
in [9]. Sarkar [20] showed that a stepup version of the procedure, which 
generalizes Hochberg's procedure in [8], also controls the /c-FWER under 
independence or certain type of positive dependence. Its critical values are 
Ui = ka/{n — iy k + k),i = 1, . . . ,n. Clearly, Procedure 1 is more powerful 
than this procedure, as ia/n > ka/{n + k — i), for all i = k, . . . ,n. 

Theorem 4.1 establishes the /c-FDR control of the single step procedure 
with the random threshold ta{k-FDRx=o) ■ For A > 0, we use the threshold, 
as in [24], based on the following modified version of /c-FDR;^(t): 

, , nmX)tG{k-l,n-l,t) 

(4.4) /c-FDR;,(t) = <! Rnit)Vl ' 



if i > A, 



where 

(4.5) 7ro*(A) 



n-Rn{X) + l 



n(l - A) ' 

a slight modification of Storey's [23] original estimate in (3.2). Define 
(4.6) tc^ik-FUkl) = sup{0 < t < 1 : k-FBRl{t) < a}, 
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and reject Hi if < tQ,(A:-FDR;^). Given < • • • < Pn-.n, this is equivalent to 
finding j = max{l <i < n-.pi-n < A}, for a fixed A G (0, 1), and then rejecting 
^(1) ' • • • ' ^{i) ' where 

(4.7) [=max|l<i<j:pi:„<min G"! (^^^^^^^ , A |, 

if the maximums at both stages exist, otherwise not rejecting any hypothesis. 
Nevertheless, we will consider slightly more conservative procedures of the 
following type. 

Procedure 2. Given a fixed A G (0,1), find, at the first stage, j = 
max{l <i< n:pi-n < A}. At the second stage, reject -ff(i), • • • , where 

(4.8) / = maxj 1 < i < J : p,..^ < A min [g^^ , i] | . 

If the maximum does not exist at either stage, do not reject any hypothesis. 

Theorem 4.2. Procedure 2, for every fixed A G (0,1), controls the k- 
FDR at a if the p-values are independent and those corresponding to the 
true null hypotheses are i.i.d. C/(0, 1). 

Remark 4.1. Let no be the number of true null hypotheses. When uq < 
k, the fc-FDR is zero and hence trivially controlled. When hq > k, the /c-FDR 
of a stepup procedure with critical values ai <•••< cin is bounded above by 
nomaxfc<r<„{Gfc^„(,(a,.)/r} under the conditions assumed in Theorem 4.1, 
as seen from (A. 2) and (A. 8). With unknown no, we consider the maximum 
of this upper bound with respect to no, which is nm.axk<r<n{Gk,nio!r)/r}, 
and choose the Or satisfying nGk^n{oir) /f = a that makes it equal to a. This 
is how we develop Procedure 1 generalizing the BH FDR procedure. 

Procedure 2 is an adaptive /c-FDR procedure generalizing that in [24]. 
It attempts to improve the /c-FDR control of Procedure 1 by sharpening it 
using an estimate of no obtained from the available p-values. More formally, 
given any < A < 1, we choose the subject to no(A)Gfc^„(ar)/r = a, con- 
sidering the no(A) = [n — Rn{X) + 1]/(1 — A) used in [24], and then slightly 
modify it so that we can theoretically establish the /c-FDR control of the 
modified adaptive procedure when /c > 1. Of course, when k = l this mod- 
ification does not make any difference, while it results in a slightly more 
conservative procedure when /c > 1. We will explain later why it is more 
conservative when /c > 1. 

A more reasonable approach to constructing a class of adaptive proce- 
dures would be to find the initially from nQ{X)Gf^^fi^^(^x^{ar)/r = a before 
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it is modified if necessary. We liave taken a sliglitly more conservative ap- 
proacli than tliis, and tlie only reason we liave done so is that we are able to 
theoretically prove the /c-FDR control of the resulting adaptive procedure 
based on (4.8), but not of the original one based on (4.7) when k> 1. This 
proof is an extension of a proof given in [21] and alternative to those given 
in [2, 24] of the FDR control of the procedure in [24]. 

A careful study of our proof of the fc-FDR control of Procedure 2 would 
reveal, at least theoretically, that (4.5) is a natural choice for an estimate 
of no that can be used adaptively in Procedure 1 maintaining a control of 
the /c-FDR. Alternative and more complicated methods of estimating uq are 
given in [10, 14]. However, unlike (4.5), it seems hard to prove theoretically 
that using these estimates adaptively in Procedure 1 will control the k- 
FDR. Of course, if these are used nonadaptively, that is, by obtaining them 
independently before incorporating into Procedure 1 to find Or satisfying 
noGk,n{or)/'>' = Oi, the A;-FDR can be controlled as long as E^I/uq) < l/riQ. 

If no > were known, the least conservative stepup procedure controlling 
the /c-FDR at level a would be the one in which ar satisfies nQGk,noio^r)/f = 
a. This will be referred to as the oracle /c-FDR procedure in this article. 

Let us now explain why Procedure 2 is more conservative when k > 1 
than the one based on (4.7). For any < A < 1 and t>0 [in particular, for 
t = ia{l — A)/(n — j + 1)] , note that 

Gk,ni>^t) = \tG{k - 1, n - 1, At) < \tG{k - 1, n - 1, t) = XGk,n{t). 

Thus, 



5. Numerical studies. Sarkar [20] introduced two stepup procedures that 
control the fc-FWER under independence. One of these we call the general- 
ized Hochberg procedure, and is based on the following critical values: 

(5 1) a(i) = i = i n 

As said before, this is actually the stepup analog of the fc-FWER procedure 
derived in [13] as a generalization of the Holm procedure [9]. The other 
procedure we call Sarkar's fc-FWER procedure, and is based on the following 
critical values: 

(5.2) «r=(«n:--^ , -1,-,- 
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In addition, Sarkar [19], while introducing the notion of the /c-FDR, proposed 
a stepup fc-FDR procedure with the following critical values: 

/.,,, fc-i . \ 1//^ 

(5.3) af = (^an ' A , ^ = l,...,n. 

It controls the A:-FDR at a under independence. We call this procedure 
Sarkar's fc-FDR procedure. The oracle /c-FDR procedure is a stepup proce- 
dure with the following critical values: 

with nQ>k. 

Numerical studies were conducted, first to get an idea of how powerful 
the notion of /c-FDR is compared to that of the A:-FWER. For that, we 
considered Procedure 1 and compared it with the above A;-FWER proce- 
dures, the generalized Hochberg and Sarkar's fc-FWER procedures, in terms 
of their critical values and average powers. Second, we wanted to compare 
the average powers of different fc-FDR procedures. Procedures 1 and 2 and 
Sarkar's fe-FDR procedure, relative to the oracle /c-FDR procedure. To recall 
the definition of average power, it is the expected proportion of false nulls 
that are rejected. 

Figure 1 presents a comparison among Procedure 1, labeled New SU, and 
the generalized Hochberg and Sarkar's fc-FWER procedures, labeled GH SU 
and Sarkar SU, respectively. We plot in this figure the three sequences of 
constants described in (4.2), (5.1) and (5.2) for {n,k) = (500, 8), (1000, 10), 
(2000, 15) and (5000, 20) and a = 0.05. The critical values of Procedure 1 
are seen to be uniformly much larger than those of the generalized Hochberg 
procedure and, except when n is quite large, they are also larger than those 
of Sarkar's /c-FWER procedure. 

Figure 2 presents a comparison among the above three procedures in terms 
of simulated average power. We considered in this case (n, fc) = (100,3), 
(200, 5), (500, 8) and (1000, 10), and a = 0.05. Each simulated average power 
was obtained by: (i) generating n independent normal random variables 
N{fj,i,l),i = l,...,n, with ni of the n ^j's being equal to 2 and the rest 
0; (ii) applying Procedure 1 and the generalized Hochberg and Sarkar's k- 
FWER procedures to the generated data to test Hi: fj,i = against Ki: fii^ 
simultaneously for i = l,...,n at a = 0.05; and (iii) repeating steps (i) 
and (ii) 1000 times before observing the proportion of the ni false Hi's 
that are correctly declared significant. As seen in this figure. Procedure 1 
is uniformly much more powerful than the generalized Hochberg procedure 
and substantially more powerful than Sarkar's A;-FWER procedure, with the 
power difference getting significantly higher with increasing number of false 
null hypotheses. 
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Fig. 1. The critical constants of Procedure 1 (4-2), generalized Hochberg k-FWER pro- 
cedure (5.1) and Sarkar's k-FWER procedure (5.2) for a = 0.05. 



We brought in the above power study the other three /c-FDR procedures, 
Procedure 2 (with A = 0.5), Sarkar's fc-FDR procedure and the oracle fc-FDR 
procedure. Figure 3 presents this comparison, with Procedure 1 now labeled 
New SU I and Procedure 2, Sarkar's fc-FDR procedure and the oracle pro- 
cedure labeled, respectively. New SU II, Sarkar and Oracle. Benchmarking 
the three A;-FDR procedures. Procedures 1 and 2 and Sarkar's procedure 
against the oracle, it is seen that Procedure 1 has the best power perfor- 
mance among these three when the number of false null hypotheses is small. 
But, with increasing number of false null hypotheses. Procedure 2 becomes 
substantially more powerful than either of the other two procedures. 
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Fig. 2. Comparison of average powers of k-FDR stepup procedures based on the sets of 
critical values given by (4-2), (5.1) and (5.2) for a = 0.05. 



6. An application to gene expression data. Hereditary breast cancer is 
known to be associated witli mutations in BRCAl and BRCA2 proteins. 
Hedenfalk et al. [7] report a group of differentially expressed genes between 
tumors with BRCAl mutations and tumors with BRCA2 mutations by an- 
alyzing one real microarray data set. The data set, which is publicly available 
from the web site http://research.nhgri.nih.gov/microarray/NEJM_ 
Supplement/, consists of 22 breast cancer samples, among which 7 samples 
are BRCAl mutants, 8 samples are BRCA2 mutants, and the remaining 
7 samples are sporadic (not used in this illustration). Expression levels in 
terms of florescent intensity ratios of a tumor sample to a common refer- 
ence sample, are measured for 3226 genes using cDNA microarrays. Before 
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Comparison of average power of k-FDR stepup procedures based on the sets of 
values given by (4-2), (4-8), (5.3) and (5.4) for a = 0.05 and A = 0.5. 



processing the data, there is a preprocessing step. If any gene has one ra- 
tio exceeding 20, then this gene is ehminated. Such preprocessing leaves 
n = 3170 genes. 

For each gene, the base 2 logarithmic transformation of the ratio was per- 
formed before computing its two-sample t-test statistic. We then computed 
its associated raw p- value by using a permutation method from [25] with the 
permutation number B = 1000. Finally, we adjusted these 3170 raw p- values 
using the following five different procedures: the three fc-FDR procedures, 
Procedures 1 and 2, Sarkar's /c-FDR procedure and two /c-FWER proce- 
dures, Sarkar's fc-FWER and the generalized Hochberg procedures, which 
are now labeled in Table 1 New SU I, New SU II, SK k-FDR SU, SK k- 
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Table 1 

Numbers of differentially expressed genes for the data in [7J with a — 0.05 and A = 0.9 





fe = 1 


fc = 3 


fe = 5 


fc = 8 


fe = 10 


fc = 15 


fc = 20 


fe = 30 


New SU I 


74 


75 


81 


103 


124 


157 


173 


229 


New SU II 


129 


129 


129 


135 


137 


162 


176 


229 


SK fc-FDR SU 


74 


33 


50 


73 


76 


94 


114 


145 


SK fc-FWER SU 


2 


19 


33 


56 


73 


87 


107 


138 


GH fc-FWER SU 


2 


5 


8 


11 


17 


21 


24 


33 



FWER SU and GH A;-FWER SU, respectively. For a = 0.05 and A = 0.9, the 
numbers of significant genes found by the above five methods are presented 
in this table for different values of A: = 1, 3, 5, 8, 10, 15, 20 and 30. 

As expected, the fc-FWER procedures are seen to be extremely conserva- 
tive, unless k is chosen large, compared to the fc-FDR procedures. Among the 
fc-FDR procedures, the two proposed ones, particularly Procedure 2, always 
detect much more differentially expressed genes. The SK fc-FDR SU does 
not appear to be more powerful than the original BH FDR procedure unless 
fc is large (relative to n); whereas, those proposed here are more powerful 
for all fc (see Section 7 for further remarks on this). 

7. Concluding remarks and additional numerical investigations. There 
is currently a growing interest in developing theory and methodology of 
multiple testing when the control of at least fc false rejections, for some 
fixed fc > 1, rather than at least one, is of importance. A number of related 
procedures have been put forward in the literature, most of which are devel- 
oped generalizing the traditional FWER. The generalized notion of FDR, 
the fc-FDR, introduced recently in [19], on the other hand, provides a more 
powerful framework in this context. This is a key point, though highlighted 
before in [19], is re-emphasized in this paper through alternative procedures 
controlling the fc-FDR, at least under independence. 

A procedure controlling fc false discoveries should get more powerful as 
fc increases, as more and more rejections are being allowed by increasing 
fc. The fc-FDR procedures proposed here have this feature, whereas the one 
previously proposed in [19] does not have (as seen in Table 1). Of course, one 
should keep in mind that the procedure in [19] was originally developed not 
for independent p-values but for dependent p-values explicitly utilizing the 
fcth order joint distribution of the null p-values. Having said that, we must 
nevertheless emphasize the point that even though our procedures are uni- 
formly more powerful than the corresponding FDR procedures, one should 
not judge the performance of a fc-FDR procedure against FDR procedures 
in the context of controlling fc false rejections. It should be judged, as noted 
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in the Introduction, against compatible A:-FWER and other /c-FDR proce- 
dures. In fact, the difference between fc-FDR and the corresponding FDR 
procedures diminishes as n becomes large relative to k. 

Choosing the value of A: in a /c-FDR procedure is an important issue. It 
could be pre-determined. For instance, in a microarray experiment involving 
thousands of genes where the scientist knows that the chance of more than 
one hypothesis being falsely rejected is high, he/she may find it worthwhile 
to make further investigative studies once at least a given number, more than 
just one, are found differentially expressed. It could also be data-driven in 
that a reasonable choice of k can be made only after looking at the data. 
For example, suppose that we are testing 100,000 hypotheses using a method 
controlling the FDR at 5% level. If 100 hypotheses are rejected, then one 
might feel comfortable adjusting this procedure to one that allows a few false 
rejections, say at most 9, and controls 10 or more of those at this level in an 
attempt to improve the power of detecting more truly false null hypotheses. 
On the other hand, if only 12 hypotheses are rejected, 10 is clearly not 
a comfortable choice. In any event, the choice of k should make it more 
worthwhile to control the fc-FDR than the FDR. Let us suppose that the 
p- values are independent and one likes to use our Procedure 1 to control the 
/c-FDR. Notice that in this procedure the ith. critical value ia/n of the BH 
procedure is calibrated to the Oi satisfying aiG{k — l,n — l,ai) = ia/n. Thus, 
we have larger rejection thresholds and hence more power, and the factor G 
essentially gives an idea about the choice of k relative n. Let k/n ^ -f £ (0,1) 
as n ^ oo. Any 7 > gives more power to Procedure 1 compared to the BH 
procedure, but the gain in power is negligible as 7 — > 0. An appropriate value 
of 7 can be determined subject to a desirable amount of improvement over 
the BH procedure. But, we will attempt to address it more formally in a 
different communication. 

It would be interesting to see how different fc-FDR procedures, including 
the oracle, proposed here under the independence assumption continue to 
perform in dependence cases. We did some additional simulations to investi- 
gate this. Among different possible types of dependence, we considered the 
equal correlation case. In particular, we generated 500 dependent normal 
random variables with the same variance 1 and a common correlation p, 
performed a multiple test using each of Procedures 1, 2 (with A = 0.5), the 
oracle and the BH procedure to test each mean at against 2 using a two- 
sided test, and repeated this over 1000 runs to simulate the /c-FDR. The 
BH procedure was included for reference. Figure 4 compares the simulated 
fc-FDR of these procedures with k = 8 and some small values of p, with 
Procedures 1 and 2 labeled, respectively. New SU I and New SU II. Interest- 
ingly, while both Procedure 1 and the oracle lose control of the /c-FDR with 
increasing dependence. Procedure 2 seems to hold it under dependency, at 
least when the dependence is not too high. 
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APPENDIX: PROOFS 
A.l. Proof of Theorem 2.1. Define 



Then, we note that 
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xI[V^Jit) + I{pi<t,Hi = 0)>k, 
Ri-_!l{t)+I{pi<t)=r] 



n n 



(A.l) 



r 

1 i=l r=k 



X Pr[yi:?(t) >k- I{pi <t,H, = 0), 
Ri-^lit)=r-Iip,<t)] 



n n 

^0* E E Z^^{ytiit) >k-l, Ri-^lit) = r - 1} 



r 

i=l r=k 



1 

^^ot E -Pr{K-i(t) >k- l,Rn-i{t) = r - 1} 



I[Vn~lit)>k-l] 
Rn~l{t) + l 



The probability in the third hne in (A.l) is obtained by taking the con- 
ditional expectation given I{pi < t) and I{Hi = 0) of the inner indicator 
function in the previous line. The fourth line follows from the fact that the 
expectation of the product of I{pi <t,Hi = 0) and a function of I{pi < t) 
and I{Hi = 0) is Pr{pj <t,Hi = 0} times the value of that function when 
both I{pi < t) and I{Hi = 0) are 1. 

A. 2. Proof of Theorem 4.1. Let us first prove the following two results 
that will be useful in proving the theorem. 

Result A.l. The function G{k,n,u) defined in (2.10) is nondecreasing 
in n and u, for any fixed 1 < k <n. 

Proof. Note that G{k,n,u) =Vi{Uk:n < u) with Ukm being the /cth 
order statistic based on n i.i.d. Uniform(0, 1) random variables, which is 
clearly increasing in u for fixed k and n. Since the value of U^-n decreases 
as n increases, G is also increasing in n for fixed k and u. □ 



Result A. 2. Let Rn be the total number of rejections in a stepup pro- 
cedure based on p-values pi,...,pn and critical values ai < ••• < a„. Let 
Pi:no ^ ■ ■ ■ ^ Pno-.no be the ordered p-values corresponding to the uq true null 
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hypotheses. Then, for any fixed 1 < k < hq, 

n 

(A.2) J2^l{Rn = r\pk.,no<ar)<l. 

r=k 

Proof. 

n 

^Pv{Rn=r\pk.no < ar) 

r=k 

n n—1 

= ^ Plc{Rn > r\pk:no < "r) " X! ^^(-^n >r + l\pk:no < "r) 
r=k r=k 

= Pr(-Rn > k\pk:no < "fc) 
n-1 

+ [Fl{Rn >r+ l\pk:no < «r+l) " Pr(i?„ > r + ljpfc,„„ < ar)]. 
r=k 

The result then follows from the fact that Pr(i?„ > k\pk:no < 0;^) = 1, 
because, given the occurrence of at least k false rejections, the probability 
that at least k hypotheses are rejected is 1, and that 

(A.3) Pr(i?„ > r + l\pk:no < Or+l) < Pr{Rn >r+ l\pk:no < (Xr), 

for all r = k, . . . ,n — 1, which can be proved as follows. 

Since Rn decreases as each of pi-.no , ■ ■ ■ , Pno-.no and the nonnull p-values 
increases, the conditional probability 

(A.4) g{pi:no,- ■ ■ ,Pno:no) = Pr(^n >r+ l|pi:no, • • • ,Pno:no) 

is nonincreasing in pi-ng, . . . ,Pno:no- Now, the order statistics, say Xi-m < 
any set of m i.i.d. (continuous) random variables are stochasti- 
cally increasing in each of its components, that is, E{(j){Xi-m, • • • , Xm-.m) \Xk;m} 
is nondecreasing (or nonincreasing) in Xk-.m^ for any fixed 1 < k < m and 
nondecreasing (or nonincreasing) function <j). See, for example, Block, Sav- 
its and Shaked [4], who defined this condition as the positive dependent 
through stochastic ordering (PDS) condition. It also follows from the pos- 
itive regression dependence condition satisfied by the joint distribution of 
order statistics; see, for example, Karlin and Rinott [11]. Thus, the condi- 
tional probability 

hipk-.no) = Pr(i?„ >r + l\pk:no) 

(A.5) 

= E{g{pi;no,- ■ ■ ,Pno:no)\Pk:no}, 

is nonincreasing in Pk-ng ) and hence 



GENERALIZED FDR 



19 



is nonincreasing in t. □ 

We are now ready to prove the theorem. First, note that given no true 
null hypotheses with the corresponding values pi, ■ ■ ■ ,Pno, the /c-FDR of 
a stepup procedure with critical values ai < • • • < On under the conditions 
assumed in the theorem is 

n 

(A.7) /c-FDR = no ^ — Pr(K-i > A; - 1, = r - 1), 

r=k ^ 

where Rn^i and V^_i are the number of rejections and the number of false 
rejections, respectively, in the stepup procedure based on the n — 1 p-values 
{pi, . . . ,Pn}\{Pno} and the critical values a^, i = 2, . . . , n. Let pi:no-i < • • • < 
Pno-i:no-i be the ordered no — 1 null p-values. Then, given {Rn-i = r — 1}, 
Vn-i > /c — 1 if and only ii pk-i-no-i < "r- Thus, the /c-FDR in (A.7) is equal 
to 

n ^ 

no — Pl^fe-l:n.o-l < ar)Pr{Rn-l = r- l|^fc_i.„o_i < Or) 
r=k 

n ^ 

(A.8) =noY^ —G{k - 1, no - 1, ar.)Pr(i?„_i = r - l\pk^i:no-i < "r) 

r=k 
n ^ 

< raoX! ~^(^ - l,n - l,ar)Pr(-Rn-l = - l|Pfc-l:no-l < «r-), 
r=fc 

using Result A.l. For the stepup procedure (4.2) with its critical values sat- 
isfying arG{k — 1, n — 1, ar) = ra/n, r = 1, . . . , n, that are increasing because 
of Result A.l, we have 

n 

(A.9) A;-FDR < ^« E Pr(i?n-i = r - l|pfc-i;„o~i < ar). 

r=k 

The theorem then follows form Result A. 2. 

A. 3. Proof of Theorem 4.2. Our proof relies on arguments used in prov- 
ing Theorem 4.1 and in [18]. Define 

(A.IO) a,, = Amin{G^^i(^^^^ii^^),l|, 1 < i < j; j = 1, . . . , n. 

Consider E{k-FDF\pj-n < A <pj+i-n), the /c-FDR conditional on pj-n < A < 
Pj-^-i:n- This conditional /c-FDR is the /c-FDR of the stepup procedure based 
on j independent p-values, each truncated above at A and the critical values 
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aij, i = 1, . . . ,j . Noting that these truncated p- values corresponding to the 
true nuh hypotheses are i.i.d. Uniform(0,A) and that 



Xr \ ' 'A/ r \ X J X{n — j + 1) ' 

we see by arguing as in our proof of Theorem 4.1 [see (A. 8)] that this condi- 
tional fc-FDR is less than or equal to (1 — A)a/[A(n — j + l)]E{V{X)\pj-n < 
A< Pj+i:n}) where V{X) — X)r=i-^(Pi — X). Thus, we have 

n 

k-FBR = E{k-FBP\pj.,n < A < Pj+i:„)Pr(pj:„ < A < p^+i-.n) 
j=k 



1- A 
A(n-j + l) 



f no n I — \ 

E E x(n -7 + 1) - ^'^^'^'^ - ^ ^ 



1 1=1 j=fc 
" 1 - A 

(A. 11) < noa^ _ Pr(pj-i:n-i < A <pj;n-i) 

j=k ™ + ^ 

no n 

= "EE -— Pr(]5i > A)Pr(pj_i:„_i < A <pj:„_i) 

1=1 j=k 
n n ^ 

—T^'^iPi > A)Pr(p i-l:n-l < A< 

, , n — 7 + 1 
i=lj=fc 

n 

< a ^ Pr(pj_i;„ < A < pj;„) = aPr(pfc-i:n < A < pn-.n) < a, 

j=k 

proving the theorem. 

A. 4. Proof of (2.7). Using the result that two functions, one increasing 
and the other decreasing, of a random variable are negatively correlated, we 
first see that 

^; /(K^i>fc-l) ) ^ ^/ Pr(K_i >k- l\Rn-i) 



(A.12) 

< 



fl„_l(t) + l J I Rn-lit) + l 



The inequality then follows by dividing both sides of (A.12) by the proba- 
bility Pr{K-i > A; - 1}. 
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